Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 99
Filtrar
1.
bioRxiv ; 2024 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-38562756

RESUMO

Rare variants, comprising a vast majority of human genetic variations, are likely to have more deleterious impact on human diseases compared to common variants. Here we present carrier statistic, a statistical framework to prioritize disease-related rare variants by integrating gene expression data. By quantifying the impact of rare variants on gene expression, carrier statistic can prioritize those rare variants that have large functional consequence in the diseased patients. Through simulation studies and analyzing real multi-omics dataset, we demonstrated that carrier statistic is applicable in studies with limited sample size (a few hundreds) and achieves substantially higher sensitivity than existing rare variants association methods. Application to Alzheimer's disease reveals 16 rare variants within 15 genes with extreme carrier statistics. The carrier statistic method can be applied to various rare variant types and is adaptable to other omics data modalities, offering a powerful tool for investigating the molecular mechanisms underlying complex diseases.

2.
Genome Biol ; 25(1): 1, 2024 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-38167462

RESUMO

BACKGROUND: The vast majority of findings from human genome-wide association studies (GWAS) map to non-coding sequences, complicating their mechanistic interpretations and clinical translations. Non-coding sequences that are evolutionarily conserved and biochemically active could offer clues to the mechanisms underpinning GWAS discoveries. However, genetic effects of such sequences have not been systematically examined across a wide range of human tissues and traits, hampering progress to fully understand regulatory causes of human complex traits. RESULTS: Here we develop a simple yet effective strategy to identify functional elements exhibiting high levels of human-mouse sequence conservation and enhancer-like biochemical activity, which scales well to 313 epigenomic datasets across 106 human tissues and cell types. Combined with 468 GWAS of European (EUR) and East Asian (EAS) ancestries, these elements show tissue-specific enrichments of heritability and causal variants for many traits, which are significantly stronger than enrichments based on enhancers without sequence conservation. These elements also help prioritize candidate genes that are functionally relevant to body mass index (BMI) and schizophrenia but were not reported in previous GWAS with large sample sizes. CONCLUSIONS: Our findings provide a comprehensive assessment of how sequence-conserved enhancer-like elements affect complex traits in diverse tissues and demonstrate a generalizable strategy of integrating evolutionary and biochemical data to elucidate human disease genetics.


Assuntos
Estudo de Associação Genômica Ampla , Herança Multifatorial , Humanos , Camundongos , Animais , Epigenômica , Fenótipo , Elementos Facilitadores Genéticos , Polimorfismo de Nucleotídeo Único
3.
bioRxiv ; 2024 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-37502861

RESUMO

The inherent similarities between natural language and biological sequences have given rise to great interest in adapting the transformer-based large language models (LLMs) underlying recent breakthroughs in natural language processing (references), for applications in genomics. However, current LLMs for genomics suffer from several limitations such as the inability to include chromatin interactions in the training data, and the inability to make prediction in new cellular contexts not represented in the training data. To mitigate these problems, we propose EpiGePT, a transformer-based pretrained language model for predicting context-specific epigenomic signals and chromatin contacts. By taking the context-specific activities of transcription factors (TFs) and 3D genome interactions into consideration, EpiGePT offers wider applicability and deeper biological insights than models trained on DNA sequence only. In a series of experiments, EpiGePT demonstrates superior performance in a diverse set of epigenomic signals prediction tasks when compared to existing methods. In particular, our model enables cross-cell-type prediction of long-range interactions and offers insight on the functional impact of genetic variants under different cellular contexts. These new capabilities will enhance the usefulness of LLM in the study of gene regulatory mechanisms. We provide free online prediction service of EpiGePT through http://health.tsinghua.edu.cn/epigept/.

4.
Hum Mol Genet ; 32(21): 3105-3120, 2023 10 17.
Artigo em Inglês | MEDLINE | ID: mdl-37584462

RESUMO

DNA methyltransferase type 1 (DNMT1) is a major enzyme involved in maintaining the methylation pattern after DNA replication. Mutations in DNMT1 have been associated with autosomal dominant cerebellar ataxia, deafness and narcolepsy (ADCA-DN). We used fibroblasts, induced pluripotent stem cells (iPSCs) and induced neurons (iNs) generated from patients with ADCA-DN and controls, to explore the epigenomic and transcriptomic effects of mutations in DNMT1. We show cell type-specific changes in gene expression and DNA methylation patterns. DNA methylation and gene expression changes were negatively correlated in iPSCs and iNs. In addition, we identified a group of genes associated with clinical phenotypes of ADCA-DN, including PDGFB and PRDM8 for cerebellar ataxia, psychosis and dementia and NR2F1 for deafness and optic atrophy. Furthermore, ZFP57, which is required to maintain gene imprinting through DNA methylation during early development, was hypomethylated in promoters and exhibited upregulated expression in patients with ADCA-DN in both iPSC and iNs. Our results provide insight into the functions of DNMT1 and the molecular changes associated with ADCA-DN, with potential implications for genes associated with related phenotypes.


Assuntos
Ataxia Cerebelar , Surdez , Humanos , Ataxia Cerebelar/genética , DNA (Citosina-5-)-Metiltransferases/genética , Transcriptoma/genética , Epigenômica , DNA (Citosina-5-)-Metiltransferase 1/genética , Metilação de DNA/genética , Surdez/genética , Mutação , DNA
5.
Proc Natl Acad Sci U S A ; 120(28): e2305236120, 2023 07 11.
Artigo em Inglês | MEDLINE | ID: mdl-37399400

RESUMO

Plasma cell-free DNA (cfDNA) is a noninvasive biomarker for cell death of all organs. Deciphering the tissue origin of cfDNA can reveal abnormal cell death because of diseases, which has great clinical potential in disease detection and monitoring. Despite the great promise, the sensitive and accurate quantification of tissue-derived cfDNA remains challenging to existing methods due to the limited characterization of tissue methylation and the reliance on unsupervised methods. To fully exploit the clinical potential of tissue-derived cfDNA, here we present one of the largest comprehensive and high-resolution methylation atlas based on 521 noncancer tissue samples spanning 29 major types of human tissues. We systematically identified fragment-level tissue-specific methylation patterns and extensively validated them in orthogonal datasets. Based on the rich tissue methylation atlas, we develop the first supervised tissue deconvolution approach, a deep-learning-powered model, cfSort, for sensitive and accurate tissue deconvolution in cfDNA. On the benchmarking data, cfSort showed superior sensitivity and accuracy compared to the existing methods. We further demonstrated the clinical utilities of cfSort with two potential applications: aiding disease diagnosis and monitoring treatment side effects. The tissue-derived cfDNA fraction estimated from cfSort reflected the clinical outcomes of the patients. In summary, the tissue methylation atlas and cfSort enhanced the performance of tissue deconvolution in cfDNA, thus facilitating cfDNA-based disease detection and longitudinal treatment monitoring.


Assuntos
Ácidos Nucleicos Livres , Aprendizado Profundo , Humanos , Ácidos Nucleicos Livres/genética , Metilação de DNA , Biomarcadores , Regiões Promotoras Genéticas , Biomarcadores Tumorais/genética
6.
IEEE/ACM Trans Comput Biol Bioinform ; 20(2): 1384-1394, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35503836

RESUMO

Deciphering the free energy landscape of biomolecular structure space is crucial for understanding many complex molecular processes, such as protein-protein interaction, RNA folding, and protein folding. A major source of current dynamic structure data is Molecular Dynamics (MD) simulations. Several methods have been proposed to investigate the free energy landscape from MD data, but all of them rely on the assumption that kinetic similarity is associated with global geometric similarity, which may lead to unsatisfactory results. In this paper, we proposed a new method called Conditional Angle Partition Tree to reveal the hierarchical free energy landscape by correlating local geometric similarity with kinetic similarity. Its application on the benchmark alanine dipeptide MD data showed a much better performance than existing methods in exploring and understanding the free energy landscape. We also applied it to the MD data of Villin HP35. Our results are more reasonable on various aspects than those from other methods and very informative on the hierarchical structure of its energy landscape.


Assuntos
Benchmarking , Árvores , Dipeptídeos , Cinética , Simulação de Dinâmica Molecular , Dobramento de Proteína , Termodinâmica
7.
Nucleic Acids Res ; 51(D1): D159-D166, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36215037

RESUMO

Elucidating the role of 3D architecture of DNA in gene regulation is crucial for understanding cell differentiation, tissue homeostasis and disease development. Among various chromatin conformation capture methods, HiChIP has received increasing attention for its significant improvement over other methods in profiling of regulatory (e.g. H3K27ac) and structural (e.g. cohesin) interactions. To facilitate the studies of 3D regulatory interactions, we developed a HiChIP interactions database, HiChIPdb (http://health.tsinghua.edu.cn/hichipdb/). The current version of HiChIPdb contains ∼262M annotated HiChIP interactions from 200 high-throughput HiChIP samples across 108 cell types. The functionalities of HiChIPdb include: (i) standardized categorization of HiChIP interactions in a hierarchical structure based on organ, tissue and cell line and (ii) comprehensive annotations of HiChIP interactions with regulatory genes and GWAS Catalog SNPs. To the best of our knowledge, HiChIPdb is the first comprehensive database that utilizes a unified pipeline to map the functional interactions across diverse cell types and tissues in different resolutions. We believe this database has the potential to advance cutting-edge research in regulatory mechanisms in development and disease by removing the barrier in data aggregation, preprocessing, and analysis.


Assuntos
Cromatina , DNA , Linhagem Celular , Cromatina/genética , Regulação da Expressão Gênica , Análise de Sequência de DNA/métodos , Bases de Dados Genéticas
8.
Elife ; 112022 12 16.
Artigo em Inglês | MEDLINE | ID: mdl-36525361

RESUMO

Systems genetics holds the promise to decipher complex traits by interpreting their associated SNPs through gene regulatory networks derived from comprehensive multi-omics data of cell types, tissues, and organs. Here, we propose SpecVar to integrate paired chromatin accessibility and gene expression data into context-specific regulatory network atlas and regulatory categories, conduct heritability enrichment analysis with genome-wide association studies (GWAS) summary statistics, identify relevant tissues, and estimate relevance correlation to depict common genetic factors acting in the shared regulatory networks between traits. Our method improves power upon existing approaches by associating SNPs with context-specific regulatory elements to assess heritability enrichments and by explicitly prioritizing gene regulations underlying relevant tissues. Ablation studies, independent data validation, and comparison experiments with existing methods on GWAS of six phenotypes show that SpecVar can improve heritability enrichment, accurately detect relevant tissues, and reveal causal regulations. Furthermore, SpecVar correlates the relevance patterns for pairs of phenotypes and better reveals shared SNP-associated regulations of phenotypes than existing methods. Studying GWAS of 206 phenotypes in UK Biobank demonstrates that SpecVar leverages the context-specific regulatory network atlas to prioritize phenotypes' relevant tissues and shared heritability for biological and therapeutic insights. SpecVar provides a powerful way to interpret SNPs via context-specific regulatory networks and is available at https://github.com/AMSSwanglab/SpecVar, copy archived at swh:1:rev:cf27438d3f8245c34c357ec5f077528e6befe829.


Assuntos
Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla , Fenótipo , Regulação da Expressão Gênica , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único
10.
Nat Commun ; 13(1): 5566, 2022 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-36175411

RESUMO

Early cancer detection by cell-free DNA faces multiple challenges: low fraction of tumor cell-free DNA, molecular heterogeneity of cancer, and sample sizes that are not sufficient to reflect diverse patient populations. Here, we develop a cancer detection approach to address these challenges. It consists of an assay, cfMethyl-Seq, for cost-effective sequencing of the cell-free DNA methylome (with > 12-fold enrichment over whole genome bisulfite sequencing in CpG islands), and a computational method to extract methylation information and diagnose patients. Applying our approach to 408 colon, liver, lung, and stomach cancer patients and controls, at 97.9% specificity we achieve 80.7% and 74.5% sensitivity in detecting all-stage and early-stage cancer, and 89.1% and 85.0% accuracy for locating tissue-of-origin of all-stage and early-stage cancer, respectively. Our approach cost-effectively retains methylome profiles of cancer abnormalities, allowing us to learn new features and expand to other cancer types as training cohorts grow.


Assuntos
Ácidos Nucleicos Livres , Neoplasias Gástricas , Ácidos Nucleicos Livres/genética , Análise Custo-Benefício , Detecção Precoce de Câncer , Epigenoma , Humanos , Neoplasias Gástricas/diagnóstico , Neoplasias Gástricas/genética
11.
Science ; 377(6610): 1077-1085, 2022 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-35951677

RESUMO

Mammalian genomes have multiple enhancers spanning an ultralong distance (>megabases) to modulate important genes, but it is unclear how these enhancers coordinate to achieve this task. We combine multiplexed CRISPRi screening with machine learning to define quantitative enhancer-enhancer interactions. We find that the ultralong distance enhancer network has a nested multilayer architecture that confers functional robustness of gene expression. Experimental characterization reveals that enhancer epistasis is maintained by three-dimensional chromosomal interactions and BRD4 condensation. Machine learning prediction of synergistic enhancers provides an effective strategy to identify noncoding variant pairs associated with pathogenic genes in diseases beyond genome-wide association studies analysis. Our work unveils nested epistasis enhancer networks, which can better explain enhancer functions within cells and in diseases.


Assuntos
Doença , Elementos Facilitadores Genéticos , Epistasia Genética , Aprendizado de Máquina , Proteínas de Ciclo Celular , Doença/genética , Estudo de Associação Genômica Ampla , Humanos , Células K562 , Proteínas Nucleares/genética , Fatores de Transcrição/genética
12.
iScience ; 25(8): 104790, 2022 Aug 19.
Artigo em Inglês | MEDLINE | ID: mdl-35992073

RESUMO

Complex traits such as cardiovascular diseases (CVD) are the results of complicated processes jointly affected by genetic and environmental factors. Genome-wide association studies (GWAS) identified genetic variants associated with diseases but usually did not reveal the underlying mechanisms. There could be many intermediate steps at epigenetic, transcriptomic, and cellular scales inside the black box of genotype-phenotype associations. In this article, we present a machine-learning-based cross-scale framework GRPath to decipher putative causal paths (pcPaths) from genetic variants to disease phenotypes by integrating multiple omics data. Applying GRPath on CVD, we identified 646 and 549 pcPaths linking putative causal regions, variants, and gene expressions in specific cell types for two types of heart failure, respectively. The findings suggest new understandings of coronary heart disease. Our work promoted the modeling of tissue- and cell type-specific cross-scale regulation to uncover mechanisms behind disease-associated variants, and provided new findings on the molecular mechanisms of CVD.

13.
Genome Biol ; 23(1): 114, 2022 05 16.
Artigo em Inglês | MEDLINE | ID: mdl-35578363

RESUMO

Technological development has enabled the profiling of gene expression and chromatin accessibility from the same cell. We develop scREG, a dimension reduction methodology, based on the concept of cis-regulatory potential, for single cell multiome data. This concept is further used for the construction of subpopulation-specific cis-regulatory networks. The capability of inferring useful regulatory network is demonstrated by the two-fold increment on network inference accuracy compared to the Pearson correlation-based method and the 27-fold enrichment of GWAS variants for inflammatory bowel disease in the cis-regulatory elements. The R package scREG provides comprehensive functions for single cell multiome data analysis.


Assuntos
Cromatina , Sequências Reguladoras de Ácido Nucleico , Cromatina/genética , Expressão Gênica , Redes Reguladoras de Genes , Análise de Célula Única
14.
Genomics Proteomics Bioinformatics ; 20(3): 496-507, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35293310

RESUMO

Although computational approaches have been complementing high-throughput biological experiments for the identification of functional regions in the human genome, it remains a great challenge to systematically decipher interactions between transcription factors (TFs) and regulatory elements to achieve interpretable annotations of chromatin accessibility across diverse cellular contexts. To solve this problem, we propose DeepCAGE, a deep learning framework that integrates sequence information and binding statuses of TFs, for the accurate prediction of chromatin accessible regions at a genome-wide scale in a variety of cell types. DeepCAGE takes advantage of a densely connected deep convolutional neural network architecture to automatically learn sequence signatures of known chromatin accessible regions and then incorporates such features with expression levels and binding activities of human core TFs to predict novel chromatin accessible regions. In a series of systematic comparisons with existing methods, DeepCAGE exhibits superior performance in not only the classification but also the regression of chromatin accessibility signals. In a detailed analysis of TF activities, DeepCAGE successfully extracts novel binding motifs and measures the contribution of a TF to the regulation with respect to a specific locus in a certain cell type. When applied to whole-genome sequencing data analysis, our method successfully prioritizes putative deleterious variants underlying a human complex trait and thus provides insights into the understanding of disease-associated genetic variants. DeepCAGE can be downloaded from https://github.com/kimmo1019/DeepCAGE.


Assuntos
Montagem e Desmontagem da Cromatina , Cromatina , Aprendizado Profundo , Fatores de Transcrição , Humanos , Sítios de Ligação , Cromatina/genética , Cromatina/metabolismo , Genoma Humano , Ligação Proteica , Sequências Reguladoras de Ácido Nucleico , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
15.
Proc Natl Acad Sci U S A ; 119(1)2022 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-34930827

RESUMO

Abdominal aortic aneurysm (AAA) is a common degenerative cardiovascular disease whose pathobiology is not clearly understood. The cellular heterogeneity and cell-type-specific gene regulation of vascular cells in human AAA have not been well-characterized. Here, we performed analysis of whole-genome sequencing data in AAA patients versus controls with the aim of detecting disease-associated variants that may affect gene regulation in human aortic smooth muscle cells (AoSMC) and human aortic endothelial cells (HAEC), two cell types of high relevance to AAA disease. To support this analysis, we generated H3K27ac HiChIP data for these cell types and inferred cell-type-specific gene regulatory networks. We observed that AAA-associated variants were most enriched in regulatory regions in AoSMC, compared with HAEC and CD4+ cells. The cell-type-specific regulation defined by this HiChIP data supported the importance of ERG and the KLF family of transcription factors in AAA disease. The analysis of regulatory elements that contain noncoding variants and also are differentially open between AAA patients and controls revealed the significance of the interleukin-6-mediated signaling pathway. This finding was further validated by including information from the deleteriousness effect of nonsynonymous single-nucleotide variants in AAA patients and additional control data from the Medical Genome Reference Bank dataset. These results shed important insights into AAA pathogenesis and provide a model for cell-type-specific analysis of disease-associated variants.


Assuntos
Aneurisma da Aorta Abdominal/genética , Redes Reguladoras de Genes , Estudos de Casos e Controles , Células Cultivadas , Regulação para Baixo , Humanos , Interleucina-6/metabolismo , Fatores de Transcrição Kruppel-Like/genética , Regulador Transcricional ERG/genética
16.
Nat Commun ; 12(1): 4763, 2021 08 06.
Artigo em Inglês | MEDLINE | ID: mdl-34362918

RESUMO

The comparison of gene regulatory networks between diseased versus healthy individuals or between two different treatments is an important scientific problem. Here, we propose sc-compReg as a method for the comparative analysis of gene expression regulatory networks between two conditions using single cell gene expression (scRNA-seq) and single cell chromatin accessibility data (scATAC-seq). Our software, sc-compReg, can be used as a stand-alone package that provides joint clustering and embedding of the cells from both scRNA-seq and scATAC-seq, and the construction of differential regulatory networks across two conditions. We apply the method to compare the gene regulatory networks of an individual with chronic lymphocytic leukemia (CLL) versus a healthy control. The analysis reveals a tumor-specific B cell subpopulation in the CLL patient and identifies TOX2 as a potential regulator of this subpopulation.


Assuntos
Redes Reguladoras de Genes , Leucemia Linfocítica Crônica de Células B/genética , Análise de Célula Única/métodos , Linfócitos B , Cromatina , Regulação Neoplásica da Expressão Gênica , Proteínas HMGB , Humanos , RNA Citoplasmático Pequeno , Software
17.
Nat Commun ; 12(1): 4172, 2021 07 07.
Artigo em Inglês | MEDLINE | ID: mdl-34234141

RESUMO

Cell-free DNA (cfDNA) is attractive for many applications, including detecting cancer, identifying the tissue of origin, and monitoring. A fundamental task underlying these applications is SNV calling from cfDNA, which is hindered by the very low tumor content. Thus sensitive and accurate detection of low-frequency mutations (<5%) remains challenging for existing SNV callers. Here we present cfSNV, a method incorporating multi-layer error suppression and hierarchical mutation calling, to address this challenge. Furthermore, by leveraging cfDNA's comprehensive coverage of tumor clonal landscape, cfSNV can profile mutations in subclones. In both simulated and real patient data, cfSNV outperforms existing tools in sensitivity while maintaining high precision. cfSNV enhances the clinical utilities of cfDNA by improving mutation detection performance in medium-depth sequencing data, therefore making Whole-Exome Sequencing a viable option. As an example, we demonstrate that the tumor mutation profile from cfDNA WES data can provide an effective biomarker to predict immunotherapy outcomes.


Assuntos
DNA Tumoral Circulante/genética , Análise Mutacional de DNA/métodos , Sequenciamento do Exoma/métodos , Inibidores de Checkpoint Imunológico/farmacologia , Neoplasias/genética , Adulto , Anticorpos Monoclonais Humanizados/farmacologia , Anticorpos Monoclonais Humanizados/uso terapêutico , Biomarcadores Tumorais/sangue , Biomarcadores Tumorais/genética , Biópsia , DNA Tumoral Circulante/sangue , Simulação por Computador , Conjuntos de Dados como Assunto , Resistencia a Medicamentos Antineoplásicos/genética , Feminino , Humanos , Inibidores de Checkpoint Imunológico/uso terapêutico , Masculino , Pessoa de Meia-Idade , Mutação , Neoplasias/sangue , Neoplasias/tratamento farmacológico , Neoplasias/mortalidade , Polimorfismo de Nucleotídeo Único , Prognóstico , Receptor de Morte Celular Programada 1/antagonistas & inibidores , Intervalo Livre de Progressão , Sensibilidade e Especificidade
18.
Nat Mach Intell ; 3(6): 536-544, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34179690

RESUMO

Recent advances in single-cell technologies, including single-cell ATAC-seq (scATAC-seq), have enabled large-scale profiling of the chromatin accessibility landscape at the single cell level. However, the characteristics of scATAC-seq data, including high sparsity and high dimensionality, have greatly complicated the computational analysis. Here, we proposed scDEC, a computational tool for single cell ATAC-seq analysis with deep generative neural networks. scDEC is built on a pair of generative adversarial networks (GANs), and is capable of learning the latent representation and inferring the cell labels, simultaneously. In a series of experiments, scDEC demonstrates superior performance over other tools in scATAC-seq analysis across multiple datasets and experimental settings. In downstream applications, we demonstrated that the generative power of scDEC helps to infer the trajectory and intermediate state of cells during differentiation and the latent features learned by scDEC can potentially reveal both biological cell types and within-cell-type variations. We also showed that it is possible to extend scDEC for the integrative analysis of multi-modal single cell data.

19.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34180954

RESUMO

Multi-omics data allow us to select a small set of informative markers for the discrimination of specific cell types and study of cellular heterogeneity. However, it is often challenging to choose an optimal marker panel from the high-dimensional molecular profiles for a large amount of cell types. Here, we propose a method called Mixed Integer programming Model to Identify Cell type-specific marker panel (MIMIC). MIMIC maintains the hierarchical topology among different cell types and simultaneously maximizes the specificity of a fixed number of selected markers. MIMIC was benchmarked on the mouse ENCODE RNA-seq dataset, with 29 diverse tissues, for 43 surface markers (SMs) and 1345 transcription factors (TFs). MIMIC could select biologically meaningful markers and is robust for different accuracy criteria. It shows advantages over the standard single gene-based approaches and widely used dimensional reduction methods, such as multidimensional scaling and t-SNE, both in accuracy and in biological interpretation. Furthermore, the combination of SMs and TFs achieves better specificity than SMs or TFs alone. Applying MIMIC to a large collection of 641 RNA-seq samples covering 231 cell types identifies a panel of TFs and SMs that reveal the modularity of cell type association networks. Finally, the scalability of MIMIC is demonstrated by selecting enhancer markers from mouse ENCODE data. MIMIC is freely available at https://github.com/MengZou1/MIMIC.


Assuntos
Biomarcadores , Biologia Computacional , Citometria de Fluxo/métodos , Perfilação da Expressão Gênica/métodos , Especificidade de Órgãos , Software , Algoritmos , Biologia Computacional/métodos , Bases de Dados Genéticas , Regulação da Expressão Gênica , Humanos , Especificidade de Órgãos/genética , Reprodutibilidade dos Testes
20.
Nat Commun ; 12(1): 2851, 2021 05 14.
Artigo em Inglês | MEDLINE | ID: mdl-33990562

RESUMO

Genome-wide association studies (GWAS) have cataloged many significant associations between genetic variants and complex traits. However, most of these findings have unclear biological significance, because they often have small effects and occur in non-coding regions. Integration of GWAS with gene regulatory networks addresses both issues by aggregating weak genetic signals within regulatory programs. Here we develop a Bayesian framework that integrates GWAS summary statistics with regulatory networks to infer genetic enrichments and associations simultaneously. Our method improves upon existing approaches by explicitly modeling network topology to assess enrichments, and by automatically leveraging enrichments to identify associations. Applying this method to 18 human traits and 38 regulatory networks shows that genetic signals of complex traits are often enriched in interconnections specific to trait-relevant cell types or tissues. Prioritizing variants within enriched networks identifies known and previously undescribed trait-associated genes revealing biological and therapeutic insights.


Assuntos
Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Herança Multifatorial/genética , Algoritmos , Teorema de Bayes , Simulação por Computador , Mineração de Dados , Genoma Humano , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Polimorfismo de Nucleotídeo Único , Fatores de Transcrição/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...